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[57] ABSTRACT 

An electronic circuit is used to compare two sequences, such 
as genetic sequences, to determine which alignment of the 
sequences produces the greatest similarity. The circuit 
includes a linear array of series-connected processors, each 
of which stores a single element from one of the sequences 
and compares that element with each successive element in 
the other sequence. For each comparison, the processor 
generates a scoring parameter that indicates which segment 
ending at those two elements produces the greatest degree of 
similarity between the sequences. The processor uses the 
scoring parameter to generate a similar scoring parameter 
for a comparison between the stored element and the next 
successive element from the other sequence. The processor 
also delivers the scoring parameter to the next processor in 
the array for use in generating a similar scoring parameter 
for another pair of elements. The electronic circuit deter- 
mines which processor and alignment of the sequences 
produce the scoring parameter with the highest value. 

14 Claims, 18 Drawing Sheets 
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SEQUENCE INFORMATION SIGNAL 
PROCESSOR 

This is a continuation of application Ser. No. 08/154,633, 
filed Nov. 18, 1993, now U.S. Pat. No. 5,632,041 which is 
a continuation of Ser. No. 07,518,562, filed May 2, 1990. 

The invention described herein was made in the perfor- 
mance of work under the following contracts: NASA con- 
tract NAS7-918; DOE contract DEFG03-88er60683; NSF 
contracts DIR-8809710 and DMS-8815106; and NIH con- 
tract GM 36230, and is subject to the provisions of Public. 
Law 96-517 (35 USC 202) in which the Contractors have 
elected to retain title. 

ORIGIN OF INVENTION 

1. Technical Field 

The present invention relates generally to an integrated 
circuit developed primarily in support of the human genome 
effort which is a molecular genetic analysis for mapping and 
sequencing the human genome. The present invention 
relates more specifically to an integrated circuit co-processor 
which may be used for carrying out an algorithm for 
identifying maximally similar sequences or subsequences 
and for locating highly similar segments of such sequences 
or subsequences. 

2. Background Art 

Release 63.0 of the national nucleic acid data base, 
Genbank, contains over forty million nucleotides represent- 
ing about thirty-three thousand separate entries. Similarly, 
the current protein information resource (PIR) has close to 
six thousand entries with over one and one-half million 
amino acids. These data reflect primarily the efforts of the 
molecular biology community over the last decade. The rate 
at which new data are being added to this total demonstrates 
that the available computing resources are already inad- 
equate for thorough and timely analysis of the data. 
Recently, an international commitment has been made to 
map and sequence the entire human genome in the next 10 
to 20 years. Such a program will generate at least 3.4 billion 
nucleotides of final data and maybe ten times that amount of 
raw sequencing data. This constitutes about three orders of 
magnitude more data than has been collected to date. In 
addition, the sequences from other animal and plant 
genomes will also accumulate. In the near term, the 40 
million nucleotides currently available and already proving 
burdensome, will become trivial by comparison to the total. 
Novel computer resources must be developed if these data 
are to be adequately understood and their unique potential 
for enhancing our understanding of human genetics and 
diseases are to be realized. 

A required adjunct to any program designed to charac- 
terize the human genome is the development of computer 
hardware and software systems capable of maintaining and 
analyzing the vast amounts of information that will be 
generated. This information will consist of both nucleotide 
and amino acid sequence data as well as extensive annota- 
tion necessary to provide a biological context for these data. 
It is critical for the complete and timely analysis of new 
sequence data, that they be thoroughly compared to the 
published data contained in the national data libraries. This 
analysis is important for determining and defining the func- 
tional and evolutionary relationships between sequences. 
Significantly, such sequence comparison is also critical to 
the task of constructing the complete genome sequence from 
millions of partially overlapping fragments, the so-called 
melding process. The computational load of this melding 
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process will grow not only at the national level of coordi- 
nating the efforts of many researchers, but also at the level 
of individual laboratories that must deal with the increasing 
load of raw data generated by the development of automated 
5 sequencing technologies. 

The ability of individual investigators to analyze their 
own data is limited by the power of the computers they have 
available, as well as the limited software tools capable of 
dealing with the entire sequence library. The amount of total 
10 sequence data generated to date is still less than 50 million 
character equivalents. However, this amount of data already 
taxes the ability of currently available algorithms and gen- 
eral use computers to conduct the needed comparative 
analysis of new data to the collected total. The data libraries 
have been doubling in size every year. The program that is 
15 envisioned to characterize complete genomes, will soon 
cause the data libraries to increase exponentially. Such 
programs will also change the basic nature of the collected 
data and consequently the requirements for effective tools 
for its analysis. 

20 In the latest Genbank release, the average length of an 
individual entry can span over one million bases. Many of 
the current methods of analyzing this data are based on the 
notion that each entry represents a discrete genetic element. 
However, this scenario does not adequately represent the 
25 more diffuse and complex organization of a eukaryotic 
genome, where the coding and regulatory elements of a 
simple gene can span more than one million bases. More 
complex loci, such as those coding for the rearranging 
receptors of the immune system, can span over one million 
30 bases and include hundreds or thousands of identifiably 
related elements. As more and larger sequencing efforts are 
undertaken, the complexity of information contained in 
single entries will require a novel set of maintenance and 
analytical tools. 

35 The human beta globin locus is a good example. Its entry 
in Genbank is over 73 thousand bases long and has been 
constructed from over 70 overlapping contributions. This 
single entry contains the coding and regulatory information 
for at least 4 genes and 1 pseudogene. The repetitive nature 
40 of much of the genome will also severely complicate the 
alignment and melding problems. With megabase sequenc- 
ing projects, the current concept of data entry will become 
obsolete. Not only will faster algorithms to compare 
sequences be needed as the amount of data increases, but 
45 these new tools will also have to be designed to better deal 
with longer strings of data that more directly reflect true 
genomic organization. Accordingly, novel schemes to 
handle and define these data and the biological information 
associated with them must be developed if this resource is to 
50 be useful to the scientific community. 

Of the many pressing and analytical needs concerning the 
current sequence data libraries, as well as the genome 
project, initially the most significant is the ability to survey 
the existing collection of data for sequences related to the 
55 new data. In its simplest form, this need is illustrated by 
searching the collection of gene or protein sequences for any 
that are “similar” to a discrete piece of new data. The 
comparative analyses possible between related sequences 
are critical for completely understanding the structural, 
60 functional and evolutionary characteristics of any sequence. 
Furthermore, in the case where large portions of the human 
genome are known, it will also be necessary to have the 
ability to find the precise genetic location of physiological 
markers in those cases where there may be only limited 
65 CDNA or protein sequence data available. 

Such searches are complicated by the fact that related 
sequences may be quite divergent. This means that it is 
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essential to define some measure of similarity between pairs 
of sequences that can then be tested statistically. The explicit 
series of minimal evolutionary events (substitutions, 
deletions, insertions) between two sequences must be deter- 
mined; i.e., the sequences must be aligned. Traditionally, the 
most common method of alignment has been by eye, relying 
on the researcher’s ability to recognize conserved patterns. 
This method can be rapid and effective when the sequence 
distance is relatively small and/or the researcher has a priori 
information about the probable nature of the alignment. For 
example, many new members of the immunoglobulin gene 
superfamily have been identified and aligned to other mem- 
bers on the basis of a very limited, but well-defined set of 
conserved features. However, it is certainly no longer pos- 
sible for any investigator to reliably compare a novel 
sequence against a significant portion of the existent data 
base. 

It is possible in theory to generate every possible combi- 
nation of genetic events between two sequences, score each 
one and discover the most similar. This is in practice, 
impossible for all but the shortest sequences however, as the 
combinations increase exponentially with the length of the 
sequences. Some investigators have implemented rule-based 
methods by which, given a reasonable starting alignment 
point, gaps and insertions are included according to a very 
restricted set of possibilities. These methods can be rela- 
tively rapid, but, like manual alignment, are non-rigorous 
methods as they cannot predictably guarantee that the results 
represent the optimal minimum distance, that is, the mini- 
mum evolutionary distance between two sequences or the 
series of events that provides the smallest weighted sum 
required to transform one sequence into the other. 

When the assumption is that two sequences are generally 
similar along their entire length, the alignment process is 
considered to be global in nature. However, an alignment 
proceeding from this premise can fail to recognize more 
limited regions of similarity between two otherwise unre- 
lated sequences. What is required then is the ability to find 
all regions of local alignment. For example, if an investi- 
gator has a new sequence related to a human beta globin 
gene, such as one from another species, the need is to be able 
to find the local alignment of that, more limited sequence to 
some particular portion of the 73 thousand base of the 
known beta globin locus. The same concerns are manifest in 
the melding problem. By definition, most overlapping 
sequences will only share a limited region of identity, 
illustrating a local alignment problem. 

In 1970, S. B. Needleman and C. D. Wunsch authored a 
paper entitled “A General Method Applicable To The Search 
For Similarities In The Amino Acid Sequence Of Two 
Proteins”, which was published in the Journal of Molecular 
Biology, Volume 48, Page 444. Their paper has had a great 
deal of influence in biological sequence alignment. Its 
particular advantage is that an explicit criterion for optimal- 
ity of alignment is stated and an efficient method of solution 
is given. Insertions, deletions and mismatches were allowed 
in the alignments. The method of Needleman and Wunsch fit 
into a broad class of algorithms, commonly referred to as 
dynamic programming. The general category of dynamic 
programming alignment of two sequences is discussed at 
length in a text entitled “Mathematical Methods for DNA 
Sequences” and particularly Chapter 3 thereof, entitled 
“Sequence Alignments” written by Michael S. Waterman, of 
the University of Southern California, a co-inventor of the, 
present invention. 

In 1980, Dr. Waterman, then with the Los Alamos Sci- 
entific Laboratory, collaborated with T. F. Smith, then a 
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Professor at Northern Michigan University, in publishing a 
letter entitled “Identification of Common Molecular Subse- 
quences” which appeared in the Journal of Molecular 
Biology, Volume 147, pages 195-197, 1981. In this letter, 
5 Waterman and Smith defined a new algorithm, the intention 
of which was to find a pair of segments, one from each of 
two long sequences, such that there was no other pair of 
segments with greater similarity (or “homology”). The algo- 
rithm produced a similarity measure which allowed for 
10 arbitrary length, deletions and insertions. 

In a more recent publication, entitled “A New Algorithm 
for Best Subsequence Alignments With Application to 
tRNA-rRNA Comparisons”, Waterman and Mark Eggert, in 
the Journal of Molecular Biology, Volume 197, pages 
15 723-728, (1987), describe the efficiency of the algorithm of 
Smith and Waterman for identification of maximally similar 
subsequences. The article describes the use of the algorithm 
in which alignments of interest are produced first for the best 
alignment and then making small modifications to the matrix 
20 for producing non-intersecting subsequent alignments. The 
algorithm is applied to comparisons of tRNA-rRNA 
sequences from Escherichia coli. A statistical analysis 
therein shows results which differ substantially from the 
results of an earlier analysis by others and furthermore, that 
25 the algorithm is much simpler and more efficient than those 
previously in use. 

The need for low cost, high speed data sequence com- 
parisons cannot be met even with current supercomputers 
because of existing data base size. There is therefore an 
30 existing need to provide an electronic circuit device for 
carrying out subsequence alignments of molecular 
sequences or global alignment thereof and more specifically 
for a sequence information signal processor designed to 
carry out a dynamic programming algorithm which is both 
35 effective and efficient in identifying subsequence or global 
alignments of molecular information. 

SUMMARY OF THE INVENTION 

40 The present invention comprises a sequence information 
signal processing integrated circuit chip designed to perform 
high speed calculation of a dynamic programming algorithm 
based upon Waterman and Smith. The signal processing chip 
of the present invention is designed to be a building block of 
45 a linear systolic array, the performance -of which can be 
increased by connecting additional sequence information 
signal processing chips to the array. The chip provides a high 
speed, low cost linear array processor that can locate highly 
similar segments or contiguous subsequences from any two 
50 data character streams (sequences) such as different DNA or 
protein sequences. The chip is implemented in a preferred 
embodiment using CMOS VLSI technology to provide the 
equivalent of about 400,000 transistors or 100,000 gates. 
Each chip provides 16 processing elements, operating at a 
55 12.5 MHz clock frequency. The chip is designed to provide 
16 bit, two’s compliment operation for maximum score 
precision of between -32,768 and +32,767. It is designed to 
provide a comparison between sequences as long as 4,194, 
304 elements without external software and between 
60 sequences of unlimited numbers of elements with the aid of 
external software. 

The sequence information signal processor chip of the 
present invention permits local and global similarity 
searches, that is subsequence and full sequence alignment. It 
65 provides user definable gaps/insertion penalties; user defin- 
able similarity table contents; user definable threshold val- 
ues for score reporting; user definable character set of up to 
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128 characters; user definable sequence control characters 
for streamline data base processing; variable block size for 
low or high resolution similarity searches; makes possible 
unlimited sequence length and numbers of blocks; on-chip 
block maximum score calculation; and on-chip maximum 5 
score buffer to relieve control processor data collection. It 
provides linear speedup by being configured for cascading 
more such chips and it provides threshold control with 
boundary score reset. The chip also provides for program- 
mable data base operation support; block maximum value 10 
and location calculation and buffering; user-definable query 
threshold and preload threshold and built-in self test and 
fault bypass. 

It will be seen hereinafter that each of sixteen processor 
elements on a sequence information signal processing inte- 15 
grated circuit chip of the present invention, provides the 
circuitry to compare the sequence characters of a matrix H, 
based upon a novel modification of the Smith and Waterman 
Algorithm for two sequences. Circuitry is also provided for 
defining the degrees of similarity of two sequences so that 20 
different linear deletion functions can be defined for each of 
the two sequences and different similarity weights can be 
defined for each character of the query sequence. 

In its preferred embodiment, the chip of the present 
invention is configured as a 208 pin, CMOS VLSI integrated 25 
circuit device. 

OBJECTS OF THE INVENTION 

It is therefore a principal object of the present invention 
to provide a sequence information signal processing system 30 
on a single integrated circuit chip for performing a best 
subsequence and global alignments algorithm at high speed, 
at low cost and with optimum parameter control. 

It is an additional object of the present invention to 35 
provide an integrated circuit chip having highly integrated 
VLSI technology for ascertaining the similarity between two 
segments of two different DNA or protein sequences by 
performing a best subsequence alignment algorithm. 

It is still an additional object of the present invention to 4Q 
provide an integrated circuit chip having a plurality of 
processors thereon, each such processor being designed to 
carry out an algorithm for providing scoring of the relative 
alignments of sequence segments for such uses as biological 
information signal processing, speech recognition, 45 
cryptology, geological strata analysis, handwriting 
recognition, large text database searches and other applica- 
tions which require the comparison of multiple sequences of 
data. 

BRIEF DESCRIPTION OF THE DRAWINGS 50 

The aforementioned objects and advantages, as well as 
additional objects and advantages thereof, will be more fully 
understood hereinafter as a result of a detailed description of 
a preferred embodiment when taken in conjunction with the 55 
following drawings in which: 

FIG. 1 is a graphical illustration of the matrix elements of 
the algorithm of the present invention and illustrating a 
projection technique for reducing the number of real time 
processors for carrying out the algorithm; 60 

FIGS. 2-9 illustrate sequential snapshot representations 
of the algorithm steps of the present invention in a four-by- 
four exemplary matrix; 

FIG. 10 is a graphical schematic illustration of the manner 
in which the architecture of a processor of the present 65 
invention performs the algorithmic steps for a particular 
matrix element; 
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FIG. 11 is a generalized, functional block diagram of a 
processor of the present invention; 

FIGS. 12 and 13, when taken together, represent a block 
diagram of an actual processor of the present invention; 

FIGS. 14 and 15, when taken together, constitute a 
schematic block diagram of the chip circuit of the present 
invention; 

FIG. 16 is a layout schematic illustrating the physical 
configuration of the signal processing chip of the invention; 
and 

FIGS. 17 and 18 taken together provide a dependence 
graph mapping for multiple chips representing a total of 34 
processors of the present invention. 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT 

The information signal processor integrated circuit chip of 
the present invention is designed to compare two sequences, 
such as two molecular sequences, and to determine their 
similarity by ascertaining the best score of any alignment 
between such sequences. A preferred embodiment of the 
invention illustrated herein is designed to perform this 
sequence comparison by carrying out the previously iden- 
tified Smith and Waterman algorithm. Accordingly, the 
method and apparatus of the present invention may be best 
understood by first understanding the algorithm on which it 
is based and which comprises the following: 

For two sequences A=a 1 a 2 . . . a„ and B=b 1 b 2 . . . b m , the 
best (largest) score from aligning A and B is S(A,B). is 
defined as the best score of any alignment ending at a,- and 
b ; - or 0. So, 

// ; y=max{0; S(a x a x +1 . . . a h b y b y +l . . . bj); 1 <=x<=i, l<=y<=j}. 

The similarity measure between sequence letters a and b 
is s(a,b) where, 

s(a,b)>0 if a=b 

s(a,b)<0 for at least some cases of a not equal to b. 

The similarity algorithm is started with: 

H iO =H Oj =0, 1 <=i<=n, 1 <=j<=m. 

Then: 

i/ iV =max{0, H^j^+sia^bj), E ip F t j ] 

where: 

£ i -=max{i/ iV _ 1 -(M £ +v £ ), E q _ x -v e } 

F i,j =mQX { H i-l,r ( U F +V f) > G-iJ-V/r} 

From the above, it will be seen that each processor for 
determining the best score H^ of an alignment ending at a,- 
and by must provide parameters for the calculation of H I+ , ■; 
H ij +1 ; and H /+l5/+:L . This requirement for generating param- 
eters for subsequent best score calculation processes may be 
better understood by reference to FIG. 1, which for purposes 
of example, illustrates a four-by-four matrix of calculations 
for n=4 and m=4. It will be seen in FIG. 1 that each 
alignment comparison process is represented by a circle 
having within it elements of the two sequences, A and B, at 
which the respective alignments are being scored. It will also 
be seen in FIG. 1, that parameters are passed either from left 
to right or from top to bottom or diagonally from upper left 
to lower right from each alignment process circle to the 
others in the matrix in order to carry out the algorithm of the 
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present invention. Thus for example, it will be seen in FIG. 

1, that the best score for the alignment ending at a 2 and b 2 , 
receives the parameter H lfl from the Si 1 ,b 1 comparison 
process; receives F^ 2 and F 12 from the a 1? b 2 comparison 
process; and receives the H 21 and E 2?1 parameters from the 5 
& 2 ,bi process. All of these parameters are, in accordance 
with the Waterman and Smith algorithm, required to gener- 
ate H 2 2 which is defined as the best score of the alignment 
of the A and B sequences ending at a 2 and b 2 . 

It will also be seen in FIG. 1, that as a result of the 10 
computation carried out by the process at a 2 ,b 2 parameters 
H 2 2 , E 2 2 and F 2 2 , all resulting from the best score align- 
ment computation at a 2 ,b 2 are transferred as required to each 
of the three subsequent comparisons a 2 ,b 3 ,a 3 ,b 2 and a 3 ,b 3 . 
Based upon the need for the generation of parameters for 15 
best score alignment comparisons for previous values of a- 
and by in the sequences of A and B, it will be seen that not 
all of the best score alignment computation processes can be 
carried out simultaneously. Thus for example, best score 
computation for a 1? b 2 and a 2 ,b 1 must await the results of the 20 
computation process for a 2 ,b 2 . Similarly, the computation 
process for a 1? b ± must await the results of the computation 
processes for a 1 ,b 1 a 2 ,b 1 and a 1 ,b 2 . Consequently, it would 
be entirely inefficient to perform the algorithm depicted in 
FIG. 1 for an exemplary four-by-four matrix with a separate 25 
processor for each combination of a t - and by. On the contrary, 
it would be most efficient to use only that number of 
processors which equals to the maximum number of pro- 
cessors being used at any one time, based upon the sequence 
of parameter generation required, as shown in FIG. 1. 30 
Accordingly, as seen in the right most portion of FIG. 1, the 
Smith and Waterman algorithm for a four-by-four matrix, 
that is for A=a 1 ,a 2 ,a 3 ,a 4 and B=b 1 ,b 2 ,b 3 and b 4 ; may be 
carried out by four computation processors with appropriate 
interconnections to assure the transfer of necessary param- 35 
eters from processor to processor. 

In the language of VLSI array processor design, the 
left-most portion of FIG. 1 is referred to as a systolic parallel 
processor array and the right-most portion of FIG. 1 is 
referred to as a signal flow graph. The technique for mapping 40 
algorithms into systolic parallel processor arrays and the 
technique for projecting such graphs into signal flow graphs 
may be understood best by referring to the text entitled VLSI 
Array Processors by S. Y. Kung, published by the Signal and 
Image Processing Institute of the University of Southern 45 
California, Copyright 1986. 

The signal flow graph of the right side of FIG. 1, illus- 
trates that the systolic processor array graph on the left side 
may be horizontally projected into a signal flow configura- 
tion which requires only four processor elements to carry out 50 
the four-by-four matrix algorithm. For the example, as 
shown in FIG. 1, each such processor on the right-most 
portion of FIG. 1 is permanently associated with an element 
of the A sequence, namely a 1 ,a 2 ,a 3 , and a 4 , respectively. On 
the other hand, the B sequence elements, namely, b 1 ,b 2 ,b 3 55 
and b 4 a respectively, are sequentially applied in a serial 
manner through the elements so that the first alignment best 
score computation occurs at a 1 ,b 1 . 

The lines with arrow heads associated with each of the 
elements in the right-most portion of FIG. 1, represent 60 
parameter values that are either transferred from element to 
element in series or are fed back and used in the same 
element for the next computation. More specifically, FIG. 2 
represents a combined systolic array graph and horizontal 
projection graph at a “snapshot” in time at which the a. 1 ,b 1 65 
alignment computation is taking place as represented by the 
dashed line through the a. 1 ,b 1 processor in the left portion of 
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FIG. 2. The b 2 signal has been applied to the first processor 
to permit the computation of the score ending at a 1 ,b 1 . The 
parameter values emanating from this first sequence com- 
putation are represented by the arrow head lines emanating 
from the first processor element shown therein at the right 
most portion of FIG. 2. As seen therein, E lfl and H ±1 are 
both fed back into the a element for the subsequent com- 
putation. In addition, the H lfl the F lfl and the b ± signals are 
transferred to the next processor element with which a is 
permanently associated. 

The next subsequent snapshot of sequence operation is 
shown in FIG. 3, and as illustrated by the dashed line in the 
left most portion of FIG. 3, this snapshot finds the top-most 
sequence processor in the right-most portion of FIG. 3, 
operating on the a 1 ,b 2 computation and the processor below 
the first operates on the a 2 ,b 1 computation. Each of these first 
two element processors generates appropriate parameter 
signals required by computations in the next snapshot period 
which is shown in FIG. 4, each element with a new value of 
bj entering the top -most element and the value of b ». pro- 
cessed by the top most element being transferred to the next 
element along with the other required parameters for the 
algorithm. 

This process continues, snapshot after snapshot, as rep- 
resented by FIGS. 5, 6, 7 and 8. This example illustrates that 
the four-by-four matrix of processors for calculating the best 
score of any alignment between sequences A and B in the 
Smith and Waterman algorithm can be achieved with only 
four actual processors operating in an appropriate sequence. 
It, of course, requires the appropriate signals representing 
parameters required by the algorithm to be transferred from 
processor to processor as illustrated in snapshot to snapshot 
sequence of FIGS. 2 to 8. 

The signal flow through four processors represented by 
the right-most portion or signal flow graph portion of FIG. 
9, may be used to carry out all the required steps of the 
algorithm for a four-by-four matrix in seven snapshots or 
clock periods represented by the seven dashed lines of the 
left-most portion or systolic processor array portion of FIG. 
9. It will be understood however, that the four-by-four 
matrix of processors of FIGS. 2-9, are presented herein by 
way of illustration only. It would be highly preferable to 
provide many more than four processors in order to be able 
to compare sequences having a great deal more than just four 
elements. In fact, it will be seen hereinafter that the inte- 
grated circuit (IC) of the present invention provides sixteen 
such processors. In addition, the architecture of each such IC 
permits the serial interconnection of the sixteen processors 
on one chip with the sixteen processors on another chip, so 
that a large number of such processors can be tied together 
from chip to chip to provide a long sequence of intercon- 
nected processors. In the present invention, up to 512 such 
processors can be tied together to form a block and up to 
8,192 such blocks or 4,194,304 such processors can be 
effectively interconnected without external software. The IC 
chip of the present invention, when operating in conjunction 
with other such chips, can compare sequences as long as 
4,194,304 elements without the aid of external software. 

The logical operations actually carried out by each ele- 
ment of the systolic processor array of FIGS. 2-9 may be 
better understood by reference to FIG. 10. In FIG. 10 the 
computations and parameter generation that occur within the 
a 2 ,b 2 processor 11 are shown by way of example. As seen in 
FIG. 10, in each such processor there are four subtractors, an 
adder and three calculators of maximums. The relevant 
equations are: 
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// u = max{ 0 , Ho# +s(a l ,b l ), E ul , Fx,x} 
Ei,i = max {// lj0 - («£ + v E ), E lfi - v E } 
Fi,i = max{H 0 ,i - (w F + v F ), F 0 ,x - v E } 

H 2 , i = max{ 0 , H lfi + s(a 2 , b { ), F 2>1 , ^2,1 } 
F 2 ,i = max{// 0 ,2 - («£ + v E ), E 2fi - v E } 
F u = ma x{H ul - ( u F + v F ), Fi,! - v F } 


H u 2 = max{0, // 0 ,i + s(ax , & 2 X E l;2 , F U2 } 
E\,2 = max{//x x - ( u E + v E ), F u - v F } 
F 1j2 = max{// 0 , 2 - (w F + v F ), F 0;2 - v F } 

H 2 2 = max{0, H iti +s(a 2 , b 2 ), E 2>2 , F 22 } 
F 2j2 = max{// 21 - ( u E + v £ ), F 24 - v E } 
F 2j2 = max{//x,2 - i u F + v F ), Fx, 2 - v F } 


In accordance with these equations, the input parameters for 
the a 2 ,b 2 processor comprise: H 2 19 E 21 , H 1 2 , F 1 2 and H 1± . 
The H 21 parameter is applied to a sub tractor to which is also 
applied the value U^+V^, a constant which may be stored 
within the processor. The parameter E 2 ?1 is applied to a 
sub tractor to which is also applied the constant value V E . 
H 1j2 is applied to a subtractor to which is also applied the 
constant Up+V^ and the parameter F 1 2 is applied to a 
subtractor to which is also provided the value V^. The 
parameter H 1± is applied to an adder to which is also 
supplied a similarity function of a and b 2 which, as previ- 
ously indicated, is a constant greater than zero if a 2 is equal 
to b 2 and a constant less than zero for a 2 not equal to b 2 . 

The output of the first two sub tractors, that is the sub- 
tractors to which the parameters H 21 and E 2 ?1 are applied, 
respectively, are applied to a maximum value calculator. The 
output of this maximum value calculator is, by definition, 
E 2 2 and the outputs of the other subtractors are applied to a 
separate maximum value calculator, the output of which is 
by definition, the parameter F 2 2 . E 2 2 and F 2 2 are applied to 
a third maximum value calculator to which is also applied 
the output of the adder and a zero signal. The output of this 
third maximum calculator is by definition H 2 2 which is the 
score of the alignment ending at a 2 ,b 2 . 

The functional block diagram of a processor of the present 
invention for performing the subtractions, additions and 
maximum calculator functions illustrated in FIG. 10, is 
shown in FIG. 11. As seen in FIG. 11 at the upper left hand 
corner thereof, the input parameters are F £ _ ls/+1 , H I;1?7+1 and 
the sequence element b- +1 . As also seen in FIG. 11, there are 
a plurality of registers, namely a register into which the input 
parameters are stored for one clock cycle, as well as registers 
into which parameters generated within the processor of 
FIG. 11 are stored for one clock cycle. The purpose of these 
registers, as will be seen hereinafter, is to provide the 
necessary delays in signal transfer to the adder, subtractors 
and maximum calculators so that the processor carries out its 
algorithmic steps in the proper sequence and at the appro- 
priate time and furthermore, so that the various algorithm 
parameters are available at the appropriate adder, subtractors 
and maximum calculators when the addition, subtractions 
and maximum calculations actually occur. More specifically, 
it will be seen hereinafter that each register of FIG. 11 
imparts the appropriate amount of time delay in signal flow 
through the processor so that the input of any j parameter 
occurs simultaneously with the output of a j-1 parameter. 
Thus for example the F £ _ 1j+1 parameter is input to a register 
10 which, because of its predetermined delay, outputs simul- 
taneously therewith, the parameter F £ _yy. Similarly, the input 
to register 12, which is H £ _ 1j/+1 occurs substantially simul- 
taneously with the output which is H I _ l j . The output of 
registers 10 and 12 are applied to subtractors 24 and 26, 
respectively, to which are also supplied the constants, V and 
U+V, respectively. The output of register 12 is also applied 
to a register 16, the output of which is H ,-_ z -_ z , which is 
applied to an adder 28. Also applied to adder 28 is a signal 
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indicative of the similarity of lack thereof between a £ and by, 
referred to previously in the algorithm as the function 
s(a I -,by). This similarity value is generated by a similarity 
table 14, based upon the a £ stored therein and the by input 
therein, from a character register 22, the input to which is 
bj+i- 

The output of sub tractors 24 and 26 are both applied to a 
maximum calculator 34, the output of which by definition is 
F £ - which is an output signal of the processor of FIG. 11 for 
use in subsequent processor. The output of maximum cal- 
culator 34 is also applied to a maximum calculator 36. Other 
inputs to maximum calculator 36, include the output of the 
adder 28 and a zero signal. The output of maximum calcu- 
lator 36 is by definition, the score value signal H £ . which 
constitutes the principal information desired from the com- 
parison of two sequences ending at a £ b-. The output of 
maximum calculator 36 is also applied to register 18, the 
output of which is thus H £ . +z which is, in turn, applied to the 
subtractor 30. Subtractor 30 also receives input U+V. The 
output of subtractor 30 is applied to maximum calculator 38, 
the output of which it will be seen hereinafter is E £ 
Parameter E £ - is applied both to the maximum calculator 36 
as an input thereto and also to register 20 in the right -most 
portion of FIG. 11, as an input to that register. The output of 
register 20 is thus E iJ+l which is applied to subtractor 32 to 
which a second input is the constant V. The output of 
subtractor 32 is also applied to maximum calculator 38 to 
produce the E tj parameter. 

Thus it will be seen that the architecture depicted in FIG. 

11 carries out the various computations of a single processor 
for comparing two elements of the sequence A and B in 
accordance with Waterman and Smith Algorithm, including 
providing the necessary time delay registers, subtractors, 
adder and maximum calculators to receive the appropriate 
parameters and to generate the parameters for the subse- 
quent processor which, in turn, computes the same type of 
information for two sequence characters. It will be under- 
stood that the block diagram of FIG. 11 is of a functional 
nature only, to indicate the treatment of parameters that 
occur within one processor. However, the actual implemen- 
tation of a processor is illustrated in FIGS. 12 and 13 taken 
in combination. Reference will now be made to FIGS. 12 
and 13 for a more detailed understanding of the actual 
architecture of a processor of the present invention. 

The principal differences between the functional block 
diagram of FIG. 11 and the actual block diagram of FIGS. 

12 and 13 are the following: Sub tractors of FIG. 11 are 
actually adders with one of the inputs inverted prior to 
application to the adder, so that the equivalent operation is 
a subtraction. Another distinction is that maximum calcula- 
tors only accept two values, consequently, there are more 
maximum calculators in the actual implementation of FIGS. 
12 and 13 than there are in the functional block diagram of 
FIG. 11. Still another distinction between the functional 
block diagram and the actual block diagram of the processor 
of the present invention, is the fact that the latter must 
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incorporate signals, which in addition to the parameter 
signal previously discussed in conjunction with FIG. 11, 
must be input and output to permit proper interface from 
processor to processor, as well as to facilitate appropriate 
timing of operation. In addition, there are at least two 
additional capabilities in the actual block diagram of FIGS. 
12 and 13 as compared to the functional block diagram of 
FIG. 11. Specifically, in the actual block diagram, an addi- 
tional maximum calculator is provided which compares the 
value of H £ ■ to a preselected threshold value permitting the 
logic of the actual process or to ignore any scores which fall 
below the preset threshold value. In addition, the actual 
architecture of the processor of the present invention; pro- 
vides an additional signal path through all processors in a 
block, as well as an additional maximum calculator in each 
processor of a block, for comparing the maximum value of 
each processor with a maximum value of every other 
processor and propagating a signal which indicates when the 
maximum value of this particular processor is in fact the 
highest Hy of all of the processors in the block. 

Furthermore, it will be seen that in the block diagram of 
the actual processor of the present invention, the similarity 
table of the functional block diagram of FIG. 11, comprises 
a random access memory in which the data bus of the chip 
brings the character data into the similarity RAM, where it 
can be either written into the RAM or read out of the RAM 
and by is applied to the addressed terminal of the RAM. In 
addition, the similarity RAM is provided with a chip select 
signal and a read/write signal as well as a data output which 
provides the similarity function output from a look-up table 
in the similarity RAM. A table address signal (TA) is also 
applied to the address terminal of the similarity RAM 
through a multiplexer as a high order five byte address for 
the similarity RAM table. 

Other signals shown used in the block diagram of FIGS. 
12 and 13 include location input an d location output, which 
provide an indication of the location of the current maxi- 
mum value in the block of processors. Maximum enable 
input and maximum enable output signals enable the com- 
parison of the locally generated maximum value with the 
input maximum value in each processor. A pipeline enable 
signal is used and its state indicates when the F £ - and Re- 
values are valid data so that these values can be saved. 
Synchronous clear signals are also input and output to each 
processor. The synchronous clear input resets the H £j value 
so that the maximum value does not exceed the threshold 
value and the synchronous clear output, under certain 
conditions, namely when the maximum value generated is 
greater than the threshold value, sets the H value of the next 
processor to zero. However, it will be understood that except 
for the timing control and logic control, the use of threshold 
and maximum value transfer from processor to processor, 
the functional effect of the actual architecture depicted in 
FIGS. 12 and 13 is identical to that explained previously in 
conjunction with FIG. 11. 

The manner in which the processors are integrated in a 
chip of the present invention and the other electronics 
associated with each circuit chip of the present invention 
will now be discussed in conjunction with FIGS. 14 and 15 
which together comprise a functional block diagram of the 
biological information signal processor. Referring therefore 
now to FIGS. 14 and 15, it will be seen that each integrated 
circuit chip of the present invention comprises sixteen of the 
aforementioned processors connected in a serial array con- 
figuration in which a plurality of the aforementioned signals 
used within each processor, may be transferred from pro- 
cessor to processor on this particular chip, as well as to 
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processors on other chips to which the present chip is 
connected. As previously indicated, without the aid of 
external software, up to 512 processors may be intercon- 
nected to form what is called a block and up to 8,192 such 
5 blocks may be interconnected without external software to 
handle one sequence. 

All of the other elements of a signal processor of the 
present invention are designed to provide the requisite 
information, timing and signal flow input to and generated 
by the processors. Thus for example in the upper left-hand 
corner of FIG. 14, there is shown a plurality of registers 
which are loaded from a data bus to provide the U+V and V 
constants which are needed in all of the processors and 
which represent various values of a linear function, repre- 
senting scoring penalties for insertions and deletions in the 
15 Smith and Waterman Algorithm. 

Also provided in the integrated circuit chip of the present 
invention is a control logic device which controls the 
application of timing and logic signals to the processors, as 
well as signals which enable block and sequence counters, 
20 the outputs of which are stored in a maximum memory 
device shown in the upper right-hand corner of FIG. 15. The 
control logic also controls pause input and output signals 
which are used under certain conditions for temporarily 
halting the operation of the processors, such as when maxi- 
25 mum memory is filled. The processor of the present inven- 
tion also provides means for loading a threshold into the chip 
and for utilizing this threshold for enabling storage of 
maximums into memory only when the threshold is 
exceeded. The threshold registers are shown in the upper 
30 left-hand corner of FIG. 15. There is a preload threshold 
register which receives its input from the data bus and a 
sequence threshold register which receives its input from the 
character port when the chip is to be loaded with a query 
sequence threshold. Also provided is an adder which adds 
35 the sequence threshold and the preload threshold to provide 
what is referred to as a real threshold against which the 
scores of the respective processors are compared in a thresh- 
old comparator. A pair of counters is also provided, namely 
a block counter and a sequence counter. These counters 
40 enable the maximum memory to correlate the maximum 
score value with the sequence and the user defined block. A 
physical representation of the layout of the integrated circuit 
chip of the present invention is shown in FIG. 16. 

The sixteen processors are arranged in a serial array 
45 terminating in a pipeline register. The device in the upper 
left-hand corner of FIG. 16 is a control block which com- 
prises the control logic, counters and registers previously 
described in conjunction with FIGS. 14 and 15. 

The interface between integrated circuit chips of the 
50 present invention may be best understood by referring to 
FIGS. 17 and 18 which provide an exemplary dependence 
graph for 34 processors on three separate chips, the latter 
being shown on the right side of FIG. 18. Each chip provides 
16 processors and a pipeline register. In the dependence 
55 graph the pipeline registers are shown as rectangles which 
merely delay the operation between the last processor of one 
chip and the first processor of the next chip. 

The dependence graph of FIGS. 17 and 18 is generally a 
larger matrix version of the graphs of FIGS. 1-9, except that 
60 it includes a sufficient number of processors to demonstrate 
the “block edge” behavior based upon a minimum block size 
of 16 elements. This “block edge” behavior is designed to 
prevent maximum score buffer overflow by resetting “H” 
values in the a 16 , b 16 processor, the a 32 , b 32 processor, etc. 
65 Only the “H” values which exceed the previously noted 
threshold and which are output in the horizontal and diago- 
nal directions to the adjacent processors are reset. 
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This “block edge” resetting procedure constitutes a modi- 
fication to the Smith and Waterman algorithm which is 
unique to the present invention. It is implemented in each 
chip by means of a boundary set zero enable signal (ENZ 
flag) in the control logic of FIG. 14. If this bit is set and the 5 
output H value is greater than the threshold value, then the 
SISP chip will reset the internally fedback E value and the 
H value of the next SISP chip. 

It will now be understood that what has been disclosed 
herein comprises a sequence information signal processing 10 
integrated circuit chip designed to perform high speed 
calculation based upon the dynamic programming algorithm 
defined by Waterman and Smith. This chip is designed to be 
a building block of a linear systolic array. The performance 
of the systolic array can be increased by connecting addi- 15 
tional such chips to the array. Each such chip provides 
sixteen processor elements, a 128 word similarity table in 
each processor element, user definable query threshold and 
preload threshold and block maximum value and location 
calculation and buffering. The chip provides the equivalent 20 
of about 400,000 transistors or 100,000 gates. All numerical 
data are input in 16 bit, two’s compliment format, and result 
in comparison scores ranging from +32,767 to -32,768. A 
control logic device in the chip performs the control and 
sequencing of the processor elements. It contains threshold 25 
logic for sequence and timing, as well as enabling counters 
for sequence and block counts. 

Those having ordinary skill in the arts relevant to the 
present invention will now, as a result of applicants’ teach- 
ing herein, perceive various modifications and additions 30 
which may be made to the invention. By way of example, 
the particular algorithm as well as the architecture designed 
to perform the algorithm processes, may be altered while 
still providing a useful and accurate measure of the homol- 
ogy of two or more data sequences or subsequences thereof. 35 
Accordingly, all such modifications or additions are deemed 
to be within the scope of the invention which is to be limited 
only by the claims appended hereto. 

We claim: 

1. An electronic circuit for use in comparing two 40 
sequences of elements to determine which alignment of the 
sequences produces the greatest similarity between the 
sequences, the circuit comprising: 

multiple processors connected in series and individually 
configured to: 45 

compare an element in one of the sequences with 
successive elements in the other sequence, 
for each pair of elements compared, generate a scoring 
parameter indicating which of a plurality of seg- 
ments ending at those elements produces the greatest 50 
degree of similarity between the sequences, 
use the scoring parameter to generate another scoring 
parameter for the next pair of elements compared, 
and 

deliver the scoring parameter to another processor in 55 
the series for use in generating another scoring 
parameter for another pair of elements, 
threshold circuitry configured to determine which proces- 
sor produces the scoring parameter with the highest 
value, and 

alignment circuitry configured to determine which align- 
ment of the sequences is associated with the scoring 
parameter having the highest value. 


2. The electronic circuit of claim 1, wherein each proces- 
sor is configured to deliver the scoring parameter to the next 
processor in the series. 

3. The electronic circuit of claim 1, wherein all of the 
processors, except a final processor in the series, are con- 
figured to deliver the scoring parameter to another processor. 

4. The electronic circuit of claim 1, further comprising 
adjustment circuitry configured to adjust the scoring param- 
eters when two segments differ because one or more dele- 
tions appear in one of the segments. 

5. The electronic circuit of claim 4, wherein the adjust- 
ment circuitry is configured to adjust the scoring parameters 
by a value that depends on which of the segments contains 
the deletion. 

6. The electronic circuit of claim 1, further comprising 
adjustment circuitry configured to adjust the scoring param- 
eters when two segments differ because one or more inser- 
tions appear in one of the segments. 

7. The electronic circuit of claim 6, wherein the adjust- 
ment circuitry is configured to adjust the scoring parameters 
by a value that depends on which of the segments contains 
the insertions. 

8. The electronic circuit of claim 1, wherein the proces- 
sors are configured to generate scoring parameters concur- 
rently and each concurrently generated scoring parameter 
represents a comparison of segments ending at different 
elements in the sequences. 

9. The electronic circuit of claim 1, wherein the sequences 
are represented as A=a 1 , a 2 , ... , a„, and B=b 1? b 2 , ... , b m , 
and wherein each processor is configured to generate the 
scoring parameter associated with any two elements a,- and 
by, respectively, according to the following equations: 

H ir max{0, H^^+s^b), E^j} 
where £ i - y =max{// iV _ 1 -(t/^+y E ), E^-Vj,} 

G, 1 =max{// i -_ ±j-(U F + Vp) , F t _ ir Vp\ 

Hi'O=H O j=0 

s(a h bj )> 0 if a^bj 
s(a h bj )< 0 if a.jbj 

and \J E , W E , Up and V F are selected constants. 

10. The electronic circuit of claim 9, wherein each pro- 
cessor is configured to generate all three values H i j7 E ( - -, and 
Fyy for two elements a,-, by. 

11. The electronic circuit of claim 9, wherein each pro- 
cessor is configured to receive the values H lW ■ and F lW ■ 
from a preceding processor in the series. 

12. The electronic circuit of claim 9, further comprising a 
memory device that stores a table from which the values for 
s(a I -,b / ) are derived. 

13. The electronic circuit of claim 1, wherein each pro- 
cessor stores a single element from one of the sequences and 
compares this element to all other elements in the other 
sequence. 

14. The electronic circuit of claim 13, wherein each 
processor generates a scoring parameter for each compari- 
son of the stored element with another element. 





